Transformer-based models have gained large popularity and demonstrated promising results in long-term time-series forecasting in recent years. In addition to learning attention in time domain, recent works also explore learning attention in frequency domains (e.g., Fourier domain, wavelet domain), given that seasonal patterns can be better captured in these domains. In this work, we seek to understand the relationships between attention models in different time and frequency domains. Theoretically, we show that attention models in different domains are equivalent under linear conditions (i.e., linear kernel to attention scores). Empirically, we analyze how attention models of different domains show different behaviors through various synthetic experiments with seasonality, trend and noise, with emphasis on the role of softmax operation therein. Both these theoretical and empirical analyses motivate us to propose a new method: TDformer (Trend Decomposition Transformer), that first applies seasonal-trend decomposition, and then additively combines an MLP which predicts the trend component with Fourier attention which predicts the seasonal component to obtain the final prediction. Extensive experiments on benchmark time-series forecasting datasets demonstrate that TDformer achieves state-of-the-art performance against existing attention-based models.
translated by 谷歌翻译
Denoising diffusions are state-of-the-art generative models which exhibit remarkable empirical performance and come with theoretical guarantees. The core idea of these models is to progressively transform the empirical data distribution into a simple Gaussian distribution by adding noise using a diffusion. We obtain new samples whose distribution is close to the data distribution by simulating a "denoising" diffusion approximating the time reversal of this "noising" diffusion. This denoising diffusion relies on approximations of the logarithmic derivatives of the noised data densities, known as scores, obtained using score matching. Such models can be easily extended to perform approximate posterior simulation in high-dimensional scenarios where one can only sample from the prior and simulate synthetic observations from the likelihood. These methods have been primarily developed for data on $\mathbb{R}^d$ while extensions to more general spaces have been developed on a case-by-case basis. We propose here a general framework which not only unifies and generalizes this approach to a wide class of spaces but also leads to an original extension of score matching. We illustrate the resulting class of denoising Markov models on various applications.
translated by 谷歌翻译
从传统上讲,地球系统(例如天气和气候)的预测依赖于具有复杂物理模型的数值模拟,因此在计算中既昂贵又对领域专业知识的需求既昂贵。在过去十年中时空地球观察数据的爆炸性增长中,应用深度学习(DL)的数据驱动模型表明了各种地球系统预测任务的潜力。尽管在其他领域取得了广泛的成功,但作为新兴DL架构的变压器在该领域的采用量有限。在本文中,我们提出了Earthformer,这是一种用于地球系统预测的时空变压器。 Earthformer基于一个通用,灵活和有效的时空注意块,名为Cuboid的注意力。这个想法是将数据分解为立方体,并平行应用立方体级别的自我注意力。这些立方体与全球矢量的集合进一步相关。我们对MovingMnist数据集和新提出的混沌N体MNIST数据集进行了实验,以验证Cuboid注意的有效性,并找出地球形式的最佳设计。关于降水现象和El Nino/Southern振荡(ENSO)预测的两个现实基准测试的实验表明,Earthformer实现了最新的性能。
translated by 谷歌翻译
去核扩散模型最近已成为强大的生成模型类别。它们提供最新的结果,不仅用于无条件模拟,而且还提供了解决在各种反问题中产生的条件模拟问题时。这些模型的一个局限性在于它们在生成时间上是计算密集型的,因为它们需要长期模拟扩散过程。进行无条件的模拟时,Schr \“生成建模的Odinger桥式公式会导致理论上接地的算法缩短生成时间,这与其他提出的加速技术互补。我们将Schr \'Edinger桥式桥式扩展到条件模拟。我们在各种应用程序上演示了这种新颖的方法,包括图像超分辨率,状态空间模型的最佳过滤以及预训练的网络的完善。我们的代码可以在https://github.com/vdeborto/cdsb上找到。
translated by 谷歌翻译
我们提出了一种在线状态估计和参数学习中的变异方法(SSMS),这是一种无处不在的序列数据的潜在变量模型。根据标准批处理变异技术,我们使用随机梯度同时优化对数模型参数的对数证据的下限和状态后分布的变异近似。但是,与现有方法不同,我们的方法能够完全在线运作,因此,尽管联合后分布的维度越来越不断增长,但在合并后不需要重新审视,并且在每个时间步骤中的更新成本保持恒定国家。这是通过利用该联合后验分布及其变异近似的向后分解,并与贝尔曼型递归相结合的证据下限及其梯度来实现。我们在几个示例中证明了该方法的性能,包括高维SSM和顺序变异自动编码器。
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
Image Virtual try-on aims at replacing the cloth on a personal image with a garment image (in-shop clothes), which has attracted increasing attention from the multimedia and computer vision communities. Prior methods successfully preserve the character of clothing images, however, occlusion remains a pernicious effect for realistic virtual try-on. In this work, we first present a comprehensive analysis of the occlusions and categorize them into two aspects: i) Inherent-Occlusion: the ghost of the former cloth still exists in the try-on image; ii) Acquired-Occlusion: the target cloth warps to the unreasonable body part. Based on the in-depth analysis, we find that the occlusions can be simulated by a novel semantically-guided mixup module, which can generate semantic-specific occluded images that work together with the try-on images to facilitate training a de-occlusion try-on (DOC-VTON) framework. Specifically, DOC-VTON first conducts a sharpened semantic parsing on the try-on person. Aided by semantics guidance and pose prior, various complexities of texture are selectively blending with human parts in a copy-and-paste manner. Then, the Generative Module (GM) is utilized to take charge of synthesizing the final try-on image and learning to de-occlusion jointly. In comparison to the state-of-the-art methods, DOC-VTON achieves better perceptual quality by reducing occlusion effects.
translated by 谷歌翻译
Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q-learning algorithm, which integrates principal component analysis (PCA) with deep Q-learning to handle the mixed frequency data. In theory, we prove that the mean return under the estimated optimal policy converges to that under the optimal one and establish its rate of convergence. The usefulness of our proposal is further illustrated via simulations and an application to a diabetes dataset.
translated by 谷歌翻译
As natural language processing (NLP) for gender bias becomes a significant interdisciplinary topic, the prevalent data-driven techniques such as large-scale language models suffer from data inadequacy and biased corpus, especially for languages with insufficient resources such as Chinese. To this end, we propose a Chinese cOrpus foR Gender bIas Probing and Mitigation CORGI-PM, which contains 32.9k sentences with high-quality labels derived by following an annotation scheme specifically developed for gender bias in the Chinese context. Moreover, we address three challenges for automatic textual gender bias mitigation, which requires the models to detect, classify, and mitigate textual gender bias. We also conduct experiments with state-of-the-art language models to provide baselines. To our best knowledge, CORGI-PM is the first sentence-level Chinese corpus for gender bias probing and mitigation.
translated by 谷歌翻译
Off-policy evaluation (OPE) is a method for estimating the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In some cases, there may be unmeasured variables that can confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded Markov decision processes (MDPs). Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy's value in infinite horizon settings as well. Furthermore, we propose an efficient and robust value estimator and illustrate its effectiveness through extensive simulations and analysis of real data from a world-leading short-video platform.
translated by 谷歌翻译